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Abstract 

Let M be a random matrix in the orthogonal group 0„, distributed according 
to Haar measure, and let ^ be a fixed nxn matrix over M such that Tr {AA'') = n. 
Then the total variation distance of the random variable Tr [AM) to a standard 
normal random variable is bounded by ^^f, and this rate is sharp up to the 
constant. Analogous results are obtained for M a random unitary matrix and A 
a fixed nxn matrix over C. The proofs are applications of a new abstract normal 
approximation theorem which extends Stein's method of exchangeable pairs to 
situations in which continuous symmetries are present. 

1. Introduction 

Let On denote the group of n x n orthogonal matrices, and let M be distributed according 
to Haar measure on 0„. Let A be a fixed nxn matrix over M, subject to the condition that 
Tr (v4A*) = n, and let W = Tt{AM). D'Aristotile, Diaconis, and Newman showed in |4] 
that 

sup |P(W < x) - $(x)| ^ 

Tr(Ayi*)=n 
— oo<x<oo 

as n ^ oo. Their argument uses classical methods involving sub-subsequences and tightness, 
and cannot be improved to yield a theorem for finite n. Theorem 0] below gives an explicit 
rate of convergence of the law of W to the standard normal distribution in the total variation 
metric on probability measures, specifically, 

(1) d{^wMO,l))TV<^ 

n — 1 

for all n > 2. 

The history of this problem begins with the following theorem, first given rigorous proof 
by Borel in |2,: let X be a random vector on the unit sphere S*""^, and let Xi be the first 

coordinate of X. Then F{^Xi < t) — ^ ^t) as n — i> cx), where = e "2 dx. 

Since the first column of a Haar-distributed orthogonal matrix is uniformly distributed on the 
unit sphere, Borel's theorem follows from Theorem|3]by taking A = ^/n(BO. Borel's theorem 
was generalized in one direction by Diaconis and Freedman [Hj, who proved the convergence 
of the first k coordinates of ^/nX to independent standard normal random variables in total 
variation distance for k = o{n); |H| also contains a detailed history of this problem. This line 
of research was further developed in |7j, where a total variation bound was given between 
an r X r block of a random orthogonal matrix and an r x r matrix of independent standard 
Gaussians, for r = O (?7,^/^) . This was later improved by Jiang (see ^21) to r = O (n^/^). 
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which he proved was sharp. In the same paper, Jiang also showed that given a sequence 
of Haar distributed random matrices {M„}, there is a sequence of Gaussian matrices {Yn} 
with Yj defined on the same probabihty space as Mj such that if 

en = max I y/nMij - Yij I 

l<j<n 

with m„ < ^^^i ^ , then e„ — in probabihty as ri oo. Thus an n x j^^p^ block of a Haar 
distributed matrix can be approximated by a Gaussian matrix 'in probability'. Theorem 0] 
gives another sense in which a random orthogonal matrix is close to a matrix of independent 
normals by giving a uniform bound of distance to normal over all linear combinations of 
entries of M. 

Another special case of Theorem E] is v4 = J, so that W = Tt (M). Diaconis and Mallows 
(see 1^) first proved that Tr (M) is approximately normal; Stein jTHj and Johansson ^3] 
later independently obtained fast rates of convergence to normal of Tr (M'^) for fixed k, with 
Johansson's rates an improvement on Stein's. In studying eigenvalues of random orthogonal 
matrices, Diaconis and Shahshahani 9J extended this to show that the joint limiting distri- 
bution of Tr (M), Tr (M^), . . . , Tr (M^) converges to that of independent normal variables as 
n oo, for k fixed. 

The other source of motivation for theorems like Theorem 0] is Hoeffding's combinatorial 
central limit theorem [llj, which can be stated as follows. Let A = ( fixed n X n 

matrix over M, normalized to have row and column sums equal to zero and ^-r ■ ■ a?- = 
1. Let TT be a random permutation in S'„, and let W{n) = Then under certain 

conditions on A, W is approximately normal. Later, Bolthausen pP proved an explicit rate 
of convergence via Stein's method. Note that if 



1 TT{j) = i 

otherwise 



then W = Tt {AM), and so Hoeffding's theorem is really a theorem about the distribution 
of linear functions on the set of permutation matrices. 

The unitary group is another source of many important applications; see, e.g. IH]. In 
Section I3J the random variable Tr {AM) for A a fixed matrix over C and M a random unitary 
matrix distributed according to Haar measure on lin is considered. The main theorem of 
the section. Theorem El gives a bound on the total variation distance of i?e[Tr(v4M)] to 
standard normal analogous to that of Theorem 0] this can be viewed as theorem about real- 
linear functions on lin- Corollary [7| shows that in the limit, the complex random variable 
Tr {AM) is close to standard complex normal. The methods used here cannot be used 
directly to prove the convergence of Tr {AM) to the standard complex normal; they work for 
approximation of real-valued random variables only. A version of the present methods in a 
multivariate context is forthcoming in [3J , which includes a rate of convergence for Corollary 

m 

Notation and Conventions. The total variation distance dTv{fJ',T^) between the mea- 
sures /z and z/ on M is defined by 

dTv{t^i T^) = sup — v{A)\, 

A 
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where the supremum is over measurable sets A. This is equivalent to 

dTy(/U,i^) = ^sup j f{t)dn{t)- j f{t)du{t) , 

where the supremum is taken over continuous functions which are bounded by 1 and vanish 
at infinity; this is the definition used in what follows. The total variation distance between 
two random variables X and Y is defined to be the total variation distance between their 
distributions: 

drviX, Y) = sup |P(X e A) - F{Y e A)\ = - sup \Ef{X) - Ef{Y) I . 
A 2 / 

We will use cr^) to denote the normal distribution on M with mean ^ and variance o"^. 

Acknowledgements. I would like to thank Persi Diaconis for sharing his many insights 
with me. 

2. An abstract normal approximation theorem 

In this section, a general approach for normal approximation to random variables with 
continuous symmetries is developed. The ideas which give rise to Theorem ^ below first 
appeared in Stein ^Hl, where fast rates of convergence to Gaussian (as n oo) were obtained 
for Tr {M'^), with A; G N fixed and M a random n x n orthogonal matrix. 

Theorem 1. Suppose that {W, W^) is a family of exchangeable pairs defined on a common 
probability space with KW = and KW"^ = . Suppose that there are functions a and (3 
with 

E|a(a-^Vr)| < oo, ¥.\(3{a-^W)\ < oo, 
and a constant A such that 
(i) 

-^E [W, -W\W]= -XW + o{l)a{W), 



^E [{W, - Wf\W] = 2Xa^ + Ea^ + o{l)(3{W), 



m 



\E\We-Wf = o{l), 



where o(l) refers to the limit as e — > 0, with the implied constants deterministic. 
Then 



dT.vXW,Z) <\e\e\, 

A 



where Z ~ m{0,a^). 
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Remark: The factor of ^ in each of the three expressions above could be replaced by a 
general function /(e). In practice, is typically constructed such that — W = 0(e). 
This makes it clear that /(e) = ^ is the suitable choice for condition |(ii)[ It is less clear 
that /(e) = ^ is the suitable choice for condition |(i)| In the applications given here, while 
We — W = 0{e), symmetry conditions imply that 

E [W, -W\W]= 0{e^). 



Before beginning the proof, some background on Stein's method is helpful. The following 
lemma is key. 

Lemma 2 (Stein). Let Z ~ 91(0, 1). Then 

(i) For allfe Cl{R), 

E[f{Z)-Zf{Z)]=0. 

(ii) If Y is a random variable such that 

E[/'(y)-y/(r)] =0 

for all f G C^(]R), then XL(F) = ^{Z); i.e., Y is also distributed as a standard 
Gaussian random variable. 

(iii) For g : —>-M. with Kg{Z) < cxd given, the function 

(2) Uog{t) = e*'/2 r [^(^) _ Eg{Z)] e~^''^dx. 

J — oo 

is a solution to the differential equation 

f'{x)-xf{x)=g{x)-Eg{Z). 

The lemma says that the standard Gaussian distribution 7 on M is the unique distribution 
with the property that f^{f'{x) — xf{x))d'y{x) is always zero. The idea of Stein's method 
is that if W is a random variable such that E[/'(H^) — VF/(VF)] is always small, then the 
distribution of W is close 7. There are several approaches to bounding this quantity; the 
approach taken here is modelled on the method of exchangeable pairs (see ^1]). In any of 
the approaches, the following bounds on Uo are useful. 

Lemma 3 (Stein). Let Uo be the operator defined in equation (0). Then 

(i) \\Uog\\oo < ^/^\\g - Eg{Z)\\^ < V2^\\g\\oo 

(ii) \\{Uo9y\\oo < 2\\g-Eg{Z)\\^ < 4\\g\\^ 

(iii) ||(t/o(7)"||oo <2||(7'||oo 

With this background, the proof of Theorem ^ is straightforward. 

Proof of Theorem^ By considering a~^W instead of W, we may without loss assume that 
a = 1. For g G C^(M) fixed, let / be the solution given in equation ^ to the differential 
equation 

fix) - xf{x) = g{x) - Eg{Z). 
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Fix e. By the exchangeability of {W, W^), 

= E[{W,~W){f{W,) + f{W))] 
(3) = E [{W, - W){f{W,) - f{W)) + 2{W, - W)f{W)] 

= E[E [{W, - Wf I W] f'iW) + 2E [{W, -W)\W] f{W) + R] , 
where R is the error in the derivative approximation. By Taylor's theorem and Lemma El 

iiri 



R < 



and so 



m-W\' <\\g'\UW,-W\^ 



lim^Eli?! = 0. 



Dividing both sides of Q by 2Ae^ and taking the limit as e ^ gives: 



= E 



nw) - wf{w) + §^fiw) 



E 



9iW)~g{Z) + ^f'iW) 



Rearranging and applying the bound on ||/'|| from Lemma El yields 



\Eg{W)-Eg{Z)\ 



< 



X 



'-E\E\. 



Since C^(M) is dense (with respect to the supremum norm) in the class of bounded contin- 
uous functions vanishing at infinity, this completes the proof. □ 



3. The Orthogonal Group 

This section is mainly devoted to the proof of the following theorem. 

Theorem 4. Let A be a fixed n x n matrix over R such that Tr (AA^) = M G 0„ 
distributed according to Haar measure, and W = Tr [AM) . Let Z be a standard normal 
random variable. Then for n > 1, 

2V3 



(4) 



d{W, Z)tv < 



n — 1 



The bound in Theorem 0] is sharp up to the constant; consider the matrix A = ^/n® 
where is the n — 1 x n — 1 matrix with all zeros. For this A, Theorem |3] reproves the 
following theorem, proved in jSI with slightly worse constant 

Theorem 5. Let x G ^/nS'"'^^ be uniformly distributed, and let Z be a standard normal 
random variable. Then 

dTv{xi,Z) < 

n — \ 

It is shown in that the order of this error term is correct. 

Proof of Theorem^ First note that one can assume without loss of generality that A is 
diagonal: lei A = U DV be the singular value decomposition of A. Then W = Ti{U DVM) = 
Tt (DVMU), and the distribution of VMU is the same as the distribution of M by the 
translation invariance of Haar measure. 
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Now define the pair {W, W^) for each e as follows. Choose H = (hij) G 0(n) according to 
Haar measure, independent of M, and let = HA^H'^M, where 



A. 



— e 



1 



thus can be thought of as a small random rotation of M. Let = W(Me); {W, W^) is 
an exchangeable pair by construction. 

It is convenient to rewrite as follows. Let I2 be the 2x2 identity matrix, K the n x 2 
matrix consisting of the first two columns of H, and let 



1 
-1 



Then 



M, = M + K[{Vl - e2 - l)/2 + eCa] K^M 



M + K 



and so 
(5) 



W,-W = e 



(^-| + 0(6^)) Tr (AKK^M) + Tr {AKC2K^M) 



Now, the distribution of H is unchanged by multiplying a fixed row or column by —1 and 
H is orthogonal, thus Khijh^i = ^6ikSje- This implies that 



and 

combining this with ^ yields: 

;E [{W, -W)\W] 



'- -' n 



E[KCK^] = 0; 



n . 



= ~E [E [Tr {AKK^M)\M] \W] + -E [E [Tr {AKC2K^M)\M] \W] + 0(e) 

= -E [E [Tr {AM)\M] \ W] + 0(e) 
= -W + 0{e), 

where the independence of M and H has been used to get the third line, and the implied 
constants in the 0(e) here and in what follows may depend on n. Condition [(1)] of Theorem 
HI is thus satisfied with A 



1 

n ' 
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Recall now that A is assumed to be diagonal. The second condition of Theorem^ can also 
be verified using the expression in (0) as follows. 

(6) 



n 

2^ 



E[{W,-Wf\W] 



n 



E [E [(Tr {AKC2K^M)f\M] \W] + 0{i 



-E 



w 



+ 0{e) 



where the conditions on i' and / are justified as the expression inside the expectation is 
identically zero when either i = i' or j = j'. 

Standard techniques are available for computing the mixed moments of entries of H; see 
e.g. ^ni; section 4.2. Using these techniques and the independence of M and H gives that 
for i' ^ i and j' ^ j, 

I 1 2 

(7) E [{hiihi>2 - hi2hi>i){hjihj>2 - hj2hj>i)\M\ 



n{n — 1) 



putting this into (jHI) yields 



Si'j'Sij — 6iji5ji' 



Oi 



1 



n — 1 
1 

n — 1 



+ 0(e) 



n 



E 



2 2 _ 



Tr((iVMf)-5^ 



2 2 



+ 0(6) 



n 



-[l-Tr((AM)^)]+0(6) 



thus 
(8) 

and so 
(9) 



lim^E [(ly, - = - + ^^ — r h-TiUAM)' 

e^o ' ■' n nin — 1 ) 



n n{n — 1) 



E 



n(n 



-[l-Tr((^M)^ 



Finally, (jSJ gives immediately that 

E [\W, - Wf\W] = O(e^). 

It remains to bound nE\E\. 



E[Tr((AM)2)] = E 



ana 
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(10) 



-E 

n ^ 



ai = 1, 



and 



E 



ana 



jjTYlj^jTTlj-i 



auajjakkau 

i,j,k,l 



n+1 
n{n - l){n + 2) 



k,l 

n + 1 



(n- l)n(n + 2) 



akkaiimkimikj 

[Sij6ki{l - Sik) + Sik5ji(l - 5ij) + 



SuSjk{^ - %)] + 



n(n + 2 



= j = k = l) 



i,fc j,j i,j / 



n{n + 2) 



Now, 



2 / 2 \ 2 4 

a.an - a, J = n - y ^ a ■ 



Applying this above gives 
E[(Tr((AM)2))2] = 



3(n + l)n2 



3(n + ll 



(ra-l)n(n + 2) (n - l)n(n + 2) 
6 



< 3 + 



{n- l)(n + 2)' 
Putting these estimates into Theorem [T] gives: 



(12) 



dT.V.{W,Z) < 



2j2 + 



(n-l)(n+2) 



(n-1) 



Noting that („_;^)^(„^,2) ^ 1 for n > 3 and that the bound in Theorem |3] is trivially true for 
n = 2 completes the proof. □ 



4. The Unitary Group 

Now let M G Un be distributed according to Haar measure, A be an n x n matrix over 
C, and W = Ti {AM). In g] it was shown that if M = T + iA and A and B are fixed 
real diagonal matrices with Tr [AA*) = Tr {BB*) = n, then Tr (AT) + z Tr (BA) converges in 
distribution to a standard complex normal random variable. This implies in particular that 
Re (W) converges in distribution to (O, |) . The main theorem of this section gives a rate 
of this convergence in total variation distance. 

A more natural question might be the convergence of to a standard complex random 
variable. As this is a multivariate problem, Theorem ^ cannot be applied. A multivariate 
version of Theorem ^ is forthcoming in [3J, which also includes a rate of convergence of W 
to a standard complex Gaussian random variable. 



LINEAR FUNCTIONS ON THE CLASSICAL MATRIX GROUPS 



9 



Theorem 6. With M, A, and W as above, let Wg be the inner product ofW with the unit 
vector making angle 9 with the real axis. Then 



(13) 



dTv{We,m{0^-\\<^ 



for a constant c which is independent of 9. 

The constant c is asymptoticaUy equal to 2^/2; for n > 8 it can be taken to be 4. 



Proof. To prove the theorem, first note that it suffices to consider the case ^ = 0, that is, to 
prove that 

drv (Re{W),m(^0,^^^ < ^. 

The theorem then follows as stated since the distribution of W is invariant under multiplica- 
tion by any complex number of unit modulus. Also, A can again be assumed diagonal with 
positive real entries by the singular value decomposition. 

The proof is almost identical to the orthogonal case. Let H E Un he a. random unitary 
matrix, independent of M, and let = HA^H*M, where 



A, 



VT 



Let W, = W{M,). 

Let I2 be the 2x2 identity matrix, K the n x 2 matrix consisting of the first two columns 
of H, and let 

" r 
-1 ■ 



Co 



Then 



(14) 



W,-W 



Tr 



+ O(e^) AKK*M + eAKC2K*M 



-- + 0{e^) ) Tr {AKK*M) + Tr {AKC2K*M) 



Let = Re{W) and PVJ = Re{W^). As in the orthogonal case, to verify the conditions 
of Theorem ^ various mixed moments of the entries of H are needed. The relevant unitary 
integrals can also be found in ^U], section 4.2. They imply in particular that 



(15) 
(16) 
thus 
(17) 



E[KC2K*^] = 0, 



-Si. 

n ' 



71 

lini-E [W^ -W\W]= -W 
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condition [(i)] is satisfied with A = ^. Also by (fT^. 



n 



lim^E [{ReiTT{AKC2K*M))f \W] 



(1^ 



n 



'-ReE 



(lii'm-jiCI'kknT-lk{hiihj2 — /ij2^jl)(^A:l^«2 — ^fe2^«l) + 

.i,j,k,l 



aiim,jiakk^ik{hiihj2 — /ii2^ji)(^fci^«2 — ^fc2^«i) 



Using the formulae from JU], it is straightforward to show that 



E[{hiihj2 - hi2hji){hkihi2 - hk2hii)] 



(19) 



26u6jk{l 



6ij) ^ 26ij6ke{l 



6,k) 21{t = j = k = l) 



(n-l)(n + l) {n-l)n{n + l) 



n{n + 1) 



and 



■i2 



(20) 



hi2hji){hkihi2 
2{5ik5,ji{l- 



- hk2hii)] 

Sij)) 2 {SijSki{l - Sik)) 



(n-l)(n + l) n(n-l)(n + l) 



+ 



2I(z = j = k = l) 
n{n + 1) 



stand for summing over all pairs where i and j are distinct. Putting 



and (pUj) into (fTHj) and using the independence of M and H gives: 
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n 



2(n- l)(n+ 1) 



ReE 



n — 1 



n 



aumjiakkmik i^ik^ji{^ ~ ^ij) ^ij^kei^ — ^ik) 

I(^ = J = A: = /) 



i,j,k,£ 



n 



2(n- l)(n+ 1) 



ReE 



+ 

1 >r^' 



n — 1 



W 



n ^ 

i,k 



O-uakkT^ii'^kk 



n — 1 



n 



1 n — 1 \ ^ 2 

- > , aaakkmiirnkk H > 

i,k i 



W 



n 



2(n- 1) 



ReE 



Tr {{AMY) - ^(AM)^, j +l(w'- J^i^M)^^ 



n — 1 



n 



n — 1 



n 



rriii 



rrii 



W 



1 



+ 



1 



+ 



2 2(n-l)(n + l) 
n 

-ReE 



2(n — + 1) 
Condition (2) of Theorem^ is thus satisfied with 
1 n 



2, , W^-\W\'' 



n 



W 



(21) nE=— 

^ ^ 2(r2-l)(n + l) 2(n-l)(n + l) 

It remains to estimate nE|i?|. First, 



ReE 



-Ti{{AMY) + 



2, 1^2_|yf/|2 



n 



W 



E Tr ((AM) 



< 



O'iidjj^ijTTLjiakkO'll'ITlkimik 

auajjakkauE [mijmjifrikimik] 

i,j,k,l 
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2n2 



^ (n - l){n + 1) (n - l)n(n + 1) 



E4 



< 



2 + 



— 1 

using the formulae of ^U] to evaluate the integrals. 

Next, 



E\W\ 



E 



Putting these estimates into ()21|) proves the theorem. 



□ 



Theorem El yields the following bivariate corollary, which can also be seen as a corollary 
of the main unitary lemma of j^. 

Corollary 7. Let M be a random unitary matrix, A a fixed n x n matrix over C with 
Tr (AA*) = n, and let W = Ti [AM). Then the distribution ofW converges to the standard 
complex normal distribution in the weak-star topology. 

Proof. The result follows immediately from Theorem IHl by considering the characteristic 
function of W. □ 
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